Trends in Spam Products and Methods

نویسندگان

  • Geoff Hulten
  • Anthony Penta
  • Gopalakrishnan Seshadrinathan
  • Manav Mishra
چکیده

Introduction In this paper we analyze a very large junk e-mail corpus which was generated by a hundred thousand volunteer users of the Hotmail e-mail service. We describe how the corpus is being collected and then discuss how both the products being advertised by spam and the specific exploits being used to avoid spam filters have changed over time. Every day we randomly select one message from the mail stream of each Hotmail volunteer and ask that user to classify it for us. Thanks to these users, we have been receiving tens of thousands of hand classified messages per day, every day for the past year – our database currently contains over ten million classified messages. In this paper we further analyze two samples of the spam from this data, one from early 2003, and one from early 2004. We categorized the spam by the type of product it is selling, and by the types of exploits it uses to avoid spam filters. We are aware of very few other large scale studies of spam. One is the FTC report on false claims in spam [1]. Our study differs by using data sets that were created by randomly sampling over the entire mail stream, rather than by relying on users to report e-mail that offended them; by reporting changes in spam data over time; and by reporting on more categories of spammer exploits. Another relevant large scale study is our analysis of the geographic origins of spam [2].

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ارائه روشی مناسب برای دسته بندی نامه های الکترونیکی تبلیغاتی بر مبنای پروفایل کاربران

In general, Spam is related to satisfy or not satisfy the client and isn’t related to the content of the client’s email. According to this definition, problems arise in the field of marketing and advertising for example, it is possible that some of the advertising emails become spam for some users, and not spam for others. To deal with this problem, many researchers design an anti-s...

متن کامل

A New Hybrid Approach of K-Nearest Neighbors Algorithm with Particle Swarm Optimization for E-Mail Spam Detection

Emails are one of the fastest economic communications. Increasing email users has caused the increase of spam in recent years. As we know, spam not only damages user’s profits, time-consuming and bandwidth, but also has become as a risk to efficiency, reliability, and security of a network. Spam developers are always trying to find ways to escape the existing filters therefore new filters to de...

متن کامل

A Critical Analysis of Financial Fraud Spam in English in Terms of Persuasive Strategies: Personalization, Presupposition, and Lexical Choices

The term ‘spam’ addresses unsolicited emails sent in bulk; therefore, the term‘financial fraud spam’ refers to unwanted bulk emails in which different tricks and techniques areemployed to swindle money from the recipients. Estimates show that more than 80% of worldwideemail traffic in 2011 was spam. It should be noted that while the number of daily spam emails in2002 was 2.4 billion, this numbe...

متن کامل

Spam Construction Trends

This paper replicates and extends Observed Trends in Spam Construction Techniques: A Case Study of Spam Evolution. A corpus of 169,274 spam email was collected over a period of five years. Each spam email was tested for construction techniques using SpamAssassin’s spamicity tests. The results of these tests were collected in a database. Formal definitions of Pu and Webb’s co-existence, extincti...

متن کامل

A Classification Method for E-mail Spam Using a Hybrid Approach for Feature Selection Optimization

Spam is an unwanted email that is harmful to communications around the world. Spam leads to a growing problem in a personal email, so it would be essential to detect it. Machine learning is very useful to solve this problem as it shows good results in order to learn all the requisite patterns for classification due to its adaptive existence. Nonetheless, in spam detection, there are a large num...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004